Esta data representa el interés en diferentes tipos gustos sobre películas:
La data disponible tiene tiene estas características:
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Horror | 1 | 958 | 2.764092 | 1.4070597 | 3 | 2.705729 | 1.4826 | 1 | 5 | 4 | 0.1954091 | -1.2507644 | 0.0454600 |
| Thriller | 2 | 958 | 3.365344 | 1.2008920 | 4 | 3.430990 | 1.4826 | 1 | 5 | 4 | -0.3419797 | -0.8303760 | 0.0387991 |
| Comedy | 3 | 958 | 4.500000 | 0.7808443 | 5 | 4.662760 | 0.0000 | 1 | 5 | 4 | -1.6312257 | 2.4132180 | 0.0252279 |
| Romantic | 4 | 958 | 3.484342 | 1.2117783 | 4 | 3.559896 | 1.4826 | 1 | 5 | 4 | -0.3312922 | -0.8692307 | 0.0391508 |
| SciFi | 5 | 958 | 3.094990 | 1.3132060 | 3 | 3.118490 | 1.4826 | 1 | 5 | 4 | -0.0424614 | -1.1142428 | 0.0424278 |
| War | 6 | 958 | 3.132568 | 1.3453600 | 3 | 3.165365 | 1.4826 | 1 | 5 | 4 | -0.0566352 | -1.1557596 | 0.0434666 |
| Fantasy | 7 | 958 | 3.744259 | 1.1861018 | 4 | 3.858073 | 1.4826 | 1 | 5 | 4 | -0.5480089 | -0.7192661 | 0.0383212 |
| Animated | 8 | 958 | 3.771399 | 1.2321606 | 4 | 3.906250 | 1.4826 | 1 | 5 | 4 | -0.6579499 | -0.6674859 | 0.0398093 |
| Documentary | 9 | 958 | 3.633612 | 1.1348004 | 4 | 3.721354 | 1.4826 | 1 | 5 | 4 | -0.4794855 | -0.5959519 | 0.0366637 |
| Action | 10 | 958 | 3.514614 | 1.2340076 | 4 | 3.602865 | 1.4826 | 1 | 5 | 4 | -0.4085078 | -0.8815674 | 0.0398690 |
Para lograr saber si hay alguna manera aceptable de reducir la dimensionalidad de los datos, demos verificar ciertos comportamientos:
La matriz de correlación es la base de esta etapa. Observemosla:
Si hay correlaciones entre ciertas variables, hay esperanzas de una buena reducción de dimensiones.
Para ello debemos calcular en indice de Kaiser-Meyer-Olkin:
| MSA | |
|---|---|
| Horror | 0.5450566 |
| Thriller | 0.6310535 |
| Comedy | 0.5780425 |
| Romantic | 0.7132902 |
| SciFi | 0.7393117 |
| War | 0.7501253 |
| Fantasy | 0.5786192 |
| Animated | 0.5661427 |
| Documentary | 0.6126153 |
| Action | 0.6593483 |
| Global MSA | 0.6238825 |
Aqui hay dos pruebas:
## [1] FALSE
library(matrixcalc)
is.singular.matrix(corMatrix)
## [1] FALSE
fa.parallel(corMatrix,fm = 'ML', fa = 'fa',n.obs = nrow(theData),show.legend=F)
## Parallel analysis suggests that the number of factors = 4 and the number of components = NA
Como se sugieren 3, veamos:
library(GPArotation)
movieF_result <- fa(theData,nfactors = 3,cor = 'mixed',rotate = "varimax",fm="ML")
##
## mixed.cor is deprecated, please use mixedCor.
print(movieF_result$loadings)
##
## Loadings:
## ML2 ML1 ML3
## Horror 0.996
## Thriller 0.548 0.327
## Comedy 0.331 0.124
## Romantic 0.406 -0.172 -0.311
## SciFi 0.154 0.535
## War 0.132 0.525
## Fantasy 0.920 -0.146
## Animated 0.810
## Documentary 0.135 -0.119 0.356
## Action 0.123 0.616
##
## ML2 ML1 ML3
## SS loadings 1.809 1.431 1.284
## Proportion Var 0.181 0.143 0.128
## Cumulative Var 0.181 0.324 0.452
print(movieF_result$loadings,cutoff = 0.4)
##
## Loadings:
## ML2 ML1 ML3
## Horror 0.996
## Thriller 0.548
## Comedy
## Romantic 0.406
## SciFi 0.535
## War 0.525
## Fantasy 0.920
## Animated 0.810
## Documentary
## Action 0.616
##
## ML2 ML1 ML3
## SS loadings 1.809 1.431 1.284
## Proportion Var 0.181 0.143 0.128
## Cumulative Var 0.181 0.324 0.452
Cuando logramos que cada variable se vaya a un factor, tenemos una estructura simple.
fa.diagram(movieF_result)
Revisamos las comunalidades para ver cuanto han contribuido las variables al proceso en total:
sort(movieF_result$communalities)
## Comedy Documentary Romantic War SciFi Action
## 0.1248762 0.1592929 0.2913735 0.3003221 0.3101003 0.3989526
## Thriller Animated Fantasy Horror
## 0.4090129 0.6669207 0.8676311 0.9950000
Su opuesto es la unicidad, lo que no comparten en el proceso:
# mientras mas grande peor (lo que mantiene)
sort(movieF_result$uniquenesses)
## Horror Fantasy Animated Thriller Action SciFi
## 0.004999254 0.132368874 0.333079260 0.590986809 0.601048729 0.689898300
## War Romantic Documentary Comedy
## 0.699678020 0.708625994 0.840706232 0.875124356
La complejidad nos permite saber si una variable le pertenecería a un sólo factor, o no:
sort(movieF_result$complexity)
## Horror Animated Fantasy Action SciFi War
## 1.004181 1.033111 1.050148 1.100505 1.164864 1.185942
## Comedy Documentary Thriller Romantic
## 1.278071 1.523292 1.648158 2.267940
Es decir, planteamos que esto existe:
modelCFA <- '
# measurement model
A_Solas =~ War + Action + SciFi + Documentary
En_Pareja =~ Horror + Thriller
En_Grupo =~ Fantasy + Animated + Comedy + Romantic
'
Podemos indicar cuáles son las ordinales:
ORDINALES=c("Horror","Thriller","Comedy","Romantic", "SciFi","War","Fantasy","Animated","Documentary","Action")
Luego, vemos que produce el modelo:
cfaFIT=cfa(modelCFA,
data=theData,
ordered = ORDINALES)
allParamCFA=parameterEstimates(cfaFIT,standardized = T)
kable(allParamCFA[allParamCFA$op=="=~",])
| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all | std.nox |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A_Solas | =~ | War | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5470044 | 0.5470044 | 0.5470044 |
| A_Solas | =~ | Action | 1.2034374 | 0.0985248 | 12.214561 | 0e+00 | 1.0103323 | 1.3965425 | 0.6582856 | 0.6582856 | 0.6582856 |
| A_Solas | =~ | SciFi | 1.0626296 | 0.0954379 | 11.134253 | 0e+00 | 0.8755748 | 1.2496845 | 0.5812631 | 0.5812631 | 0.5812631 |
| A_Solas | =~ | Documentary | 0.3840705 | 0.0765992 | 5.014030 | 5e-07 | 0.2339389 | 0.5342021 | 0.2100883 | 0.2100883 | 0.2100883 |
| En_Pareja | =~ | Horror | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5538075 | 0.5538075 | 0.5538075 |
| En_Pareja | =~ | Thriller | 1.8197169 | 0.2756180 | 6.602314 | 0e+00 | 1.2795155 | 2.3599184 | 1.0077729 | 1.0077729 | 1.0077729 |
| En_Grupo | =~ | Fantasy | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.9808019 | 0.9808019 | 0.9808019 |
| En_Grupo | =~ | Animated | 0.7675995 | 0.0494625 | 15.518817 | 0e+00 | 0.6706548 | 0.8645442 | 0.7528630 | 0.7528630 | 0.7528630 |
| En_Grupo | =~ | Comedy | 0.3632286 | 0.0414612 | 8.760678 | 0e+00 | 0.2819661 | 0.4444912 | 0.3562553 | 0.3562553 | 0.3562553 |
| En_Grupo | =~ | Romantic | 0.4726677 | 0.0364832 | 12.955771 | 0e+00 | 0.4011620 | 0.5441735 | 0.4635934 | 0.4635934 | 0.4635934 |
Graficamente:
library(semPlot)
semPaths(cfaFIT,color='black')
O…
semPaths(cfaFIT, what='std', nCharNodes=6, sizeMan=5,
edge.label.cex=1.25, curvePivot = TRUE, fade=FALSE,color='black')
Qué tan buena es nuestra propuesta:
allFitCFA=as.list(fitMeasures(cfaFIT))
allFitCFA[c("chisq", "df", "pvalue")] # pvalue>0.05
## $chisq
## [1] 346.596
##
## $df
## [1] 32
##
## $pvalue
## [1] 0
allFitCFA$tli # > 0.90
## [1] 0.9098774
allFitCFA[c('rmsea.ci.lower','rmsea' ,'rmsea.ci.upper')] # 0.05 en el Int de Conf?
## $rmsea.ci.lower
## [1] 0.09187005
##
## $rmsea
## [1] 0.1013551
##
## $rmsea.ci.upper
## [1] 0.1111277
Aquí queremos plantear una regresió combinando lo observado y la latente:
Este puede ser un modelo:
modelSEM <- '
# measurement model
A_Solas =~ War + Action + SciFi + Documentary
En_Pareja =~ Horror + Thriller
En_Grupo =~ Fantasy + Animated + Comedy + Romantic
# regressions
A_Solas ~ En_Pareja + En_Grupo
'
Si obtenemos la regresión:
semFIT <- sem(modelSEM,
data=theData,
ordered = ORDINALES)
Estos son los resultados:
allParamSEM=parameterEstimates(semFIT,standardized = T)
kable(allParamSEM[allParamSEM$op=="~",])
| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all | std.nox | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | A_Solas | ~ | En_Pareja | 0.4154710 | 0.0460657 | 9.019104 | 0.0000000 | 0.3251839 | 0.505758 | 0.4206386 | 0.4206386 | 0.4206386 |
| 12 | A_Solas | ~ | En_Grupo | -0.0292517 | 0.0238584 | -1.226053 | 0.2201787 | -0.0760134 | 0.017510 | -0.0524495 | -0.0524495 | -0.0524495 |
kable(allParamSEM[allParamSEM$op=="=~",])
| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all | std.nox |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A_Solas | =~ | War | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5470042 | 0.5470042 | 0.5470042 |
| A_Solas | =~ | Action | 1.2034377 | 0.0985249 | 12.214554 | 0e+00 | 1.0103325 | 1.3965430 | 0.6582856 | 0.6582856 | 0.6582856 |
| A_Solas | =~ | SciFi | 1.0626299 | 0.0954380 | 11.134248 | 0e+00 | 0.8755749 | 1.2496848 | 0.5812631 | 0.5812631 | 0.5812631 |
| A_Solas | =~ | Documentary | 0.3840704 | 0.0765992 | 5.014027 | 5e-07 | 0.2339387 | 0.5342020 | 0.2100881 | 0.2100881 | 0.2100881 |
| En_Pareja | =~ | Horror | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5538079 | 0.5538079 | 0.5538079 |
| En_Pareja | =~ | Thriller | 1.8197146 | 0.2756176 | 6.602317 | 0e+00 | 1.2795141 | 2.3599151 | 1.0077724 | 1.0077724 | 1.0077724 |
| En_Grupo | =~ | Fantasy | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.9808015 | 0.9808015 | 0.9808015 |
| En_Grupo | =~ | Animated | 0.7675999 | 0.0494625 | 15.518820 | 0e+00 | 0.6706551 | 0.8645446 | 0.7528632 | 0.7528632 | 0.7528632 |
| En_Grupo | =~ | Comedy | 0.3632288 | 0.0414613 | 8.760679 | 0e+00 | 0.2819662 | 0.4444914 | 0.3562554 | 0.3562554 | 0.3562554 |
| En_Grupo | =~ | Romantic | 0.4726679 | 0.0364832 | 12.955773 | 0e+00 | 0.4011622 | 0.5441736 | 0.4635934 | 0.4635934 | 0.4635934 |
Visualmente:
Qué tan bueno es el modelo?
allFitSEM[c("chisq", "df", "pvalue")] # pvalue>0.05
## $chisq
## [1] 346.596
##
## $df
## [1] 32
##
## $pvalue
## [1] 0
allFitSEM$tli # > 0.90
## [1] 0.9098774
allFitSEM[c( 'rmsea.ci.lower','rmsea','rmsea.ci.upper')] # toca 0.08
## $rmsea.ci.lower
## [1] 0.09187005
##
## $rmsea
## [1] 0.1013551
##
## $rmsea.ci.upper
## [1] 0.1111277